851 research outputs found

    Sensitive and accurate detection of copy number variants using read depth of coverage

    Get PDF
    Methods for the direct detection of copy number variation (CNV) genome-wide have become effective instruments for identifying genetic risk factors for disease. The application of next-generation sequencing platforms to genetic studies promises to improve sensitivity to detect CNVs as well as inversions, indels, and SNPs. New computational approaches are needed to systematically detect these variants from genome sequence data. Existing sequence-based approaches for CNV detection are primarily based on paired-end read mapping (PEM) as reported previously by Tuzun et al. and Korbel et al. Due to limitations of the PEM approach, some classes of CNVs are difficult to ascertain, including large insertions and variants located within complex genomic regions. To overcome these limitations, we developed a method for CNV detection using read depth of coverage. Event-wise testing (EWT) is a method based on significance testing. In contrast to standard segmentation algorithms that typically operate by performing likelihood evaluation for every point in the genome, EWT works on intervals of data points, rapidly searching for specific classes of events. Overall false-positive rate is controlled by testing the significance of each possible event and adjusting for multiple testing. Deletions and duplications detected in an individual genome by EWT are examined across multiple genomes to identify polymorphism between individuals. We estimated error rates using simulations based on real data, and we applied EWT to the analysis of chromosome 1 from paired-end shotgun sequence data (30x) on five individuals. Our results suggest that analysis of read depth is an effective approach for the detection of CNVs, and it captures structural variants that are refractory to established PEM-based methods

    The effects of common structural variants on 3D chromatin structure

    Get PDF
    BACKGROUND: Three-dimensional spatial organization of chromosomes is defined by highly self-interacting regions 0.1-1 Mb in size termed Topological Associating Domains (TADs). Genetic factors that explain dynamic variation in TAD structure are not understood. We hypothesize that common structural variation (SV) in the human population can disrupt regulatory sequences and thereby influence TAD formation. To determine the effects of SVs on 3D chromatin organization, we performed chromosome conformation capture sequencing (Hi-C) of lymphoblastoid cell lines from 19 subjects for which SVs had been previously characterized in the 1000 genomes project. We tested the effects of common deletion polymorphisms on TAD structure by linear regression analysis of nearby quantitative chromatin interactions (contacts) within 240 kb of the deletion, and we specifically tested the hypothesis that deletions at TAD boundaries (TBs) could result in large-scale alterations in chromatin conformation. RESULTS: Large (> 10 kb) deletions had significant effects on long-range chromatin interactions. Deletions were associated with increased contacts that span the deleted region and this effect was driven by large deletions that were not located within a TAD boundary (nonTB). Some deletions at TBs, including a 80 kb deletion of the genes CFHR1 and CFHR3, had detectable effects on chromatin contacts. However for TB deletions overall, we did not detect a pattern of effects that was consistent in magnitude or direction. Large inversions in the population had a distinguishable signature characterized by a rearrangement of contacts that span its breakpoints. CONCLUSIONS: Our study demonstrates that common SVs in the population impact long-range chromatin structure, and deletions and inversions have distinct signatures. However, the effects that we observe are subtle and variable between loci. Genome-wide analysis of chromatin conformation in large cohorts will be needed to quantify the influence of common SVs on chromatin structure

    PROBER: oligonucleotide FISH probe design software

    Get PDF
    PROBER is an oligonucleotide primer design software application that designs multiple primer pairs for generating PCR probes useful for fluorescence in situ hybridization (FISH). PROBER generates Tiling Oligonucleotide Probes (TOPs) by masking repetitive genomic sequences and delineating essentially unique regions that can be amplified to yield small (100-2000 bp) DNA probes that in aggregate will generate a single, strong fluorescent signal for regions as small as a single gene. TOPs are an alternative to bacterial artificial chromosomes (BACs) that are commonly used for FISH but may be unstable, unavailable, chimeric, or non-specific to small (10-100 kb) genomic regions. PROBER can be applied to any genomic locus, with the limitation that the locus must contain at least 10 kb of essentially unique blocks. To test the software, we designed a number of probes for genomic amplifications and hemizygous deletions that were initially detected by Representational Oligonucleotide Microarray Analysis of breast cancer tumors. AVAILABILITY: http://prober.cshl.ed

    Infant EEG activity as a biomarker for autism: a promising approach or a false promise?

    Get PDF
    The ability to determine an infant's likelihood of developing autism via a relatively simple neurological measure would constitute an important scientific breakthrough. In their recent publication in this journal, Bosl and colleagues claim that a measure of EEG complexity can be used to detect, with very high accuracy, infants at high risk for autism (HRA). On the surface, this appears to be that very scientific breakthrough and as such the paper has received widespread media attention. But a close look at how these high accuracy rates were derived tells a very different story. This stems from a conflation between "high risk" as a population-level property and "high risk" as a property of an individual. We describe the approach of Bosl et al. and examine their results with respect to baseline prevalence rates, the inclusion of which is necessary to distinguish infants with a biological risk of autism from typically developing infants with a sibling with autism. This is an important distinction that should not be overlooked

    Representational oligonucleotide microarray analysis: A high-resolution method to detect genome copy number variation

    Get PDF
    We have developed a methodology we call ROMA (representational oligonucleotide microarray analysis), for the detection of the genomic aberrations in cancer and normal humans. By arraying oligonucleoticle probes designed from the human genome sequence, and hybridizing with "representations" from cancer and normal cells, we detect regions of the genome with altered "copy number." We achieve an average resolution of 30 kb throughout the genome, and resolutions as high as a probe every 15 kb are practical. We illustrate the characteristics of probes on the array and accuracy of measurements obtained using ROMA. Using this methodology, we identify variation between cancer and normal genomes, as well as between normal human genomes. In cancer genomes, we readily detect amplifications and large and small homozygous and hemizygous deletions. Between normal human genomes, we frequently detect large (100 kb to I Mb) deletions or duplications. Many of these changes encompass known genes. ROMA will assist in the discovery of genes and markers important in cancer, and the discovery of loci that may be important in inherited predispositions to disease

    Copy-Number Variants in Patients with a Strong Family History of Pancreatic Cancer

    Get PDF
    Copy-number variants such as germ-line deletions and amplifications are associated with inherited genetic disorders including familial cancer. The gene or genes responsible for the majority of familial clustering of pancreatic cancer have not been identified. We used representational oligonucleotide microarray analysis (ROMA) to characterize germ-line copy number variants in 60 cancer patients from 57 familial pancreatic cancer kindreds. Fifty-seven of the 60 patients had pancreatic cancer and three had nonpancreatic cancers (breast, ovary, ovary). A familial pancreatic cancer kindred was defined as a kindred in which at least two first-degree relatives have been diagnosed with pancreatic cancer. Copy-number variants identified in 607 individuals without pancreatic cancer were excluded from further analysis. A total of 56 unique genomic regions with copy-number variants not present in controls were identified, including 31 amplifications and 25 deletions. Two deleted regions were observed in two different patients, and one in three patients. The germ-line amplifications had a mean size of 662 Kb, a median size of 379 Kb (range 8.2 Kb to 2.5 Mb) and included 425 known genes. Examples of genes included in the germ-line amplifications include the MAFK, JunD and BIRC6 genes. The germ-line deletions had a mean size of 375Kb, a median size 151 Kb (range 0.4 Kb to 2.3 Mb) and included 81 known genes. In multivariate analysis controlling for region size, deletions were 90% less likely to involve a gene than were duplications (p < 0.01). Examples of genes included in the germ-line deletions include the FHIT, PDZRN3 and ANKRD3 genes. Selected deletions and amplifications were confirmed using real-time PCR, including a germ-line amplification on chromosome 19. These genetic copy-number variants define potential candidate loci for the familial pancreatic cancer gene

    ESTIMATING GENOME-WIDE COPY NUMBER USING ALLELE SPECIFIC MIXTURE MODELS

    Get PDF
    Genomic changes such as copy number alterations are thought to be one of the major underlying causes of human phenotypic variation among normal and disease subjects [23,11,25,26,5,4,7,18]. These include chromosomal regions with so-called copy number alterations: instead of the expected two copies, a section of the chromosome for a particular individual may have zero copies (homozygous deletion), one copy (hemizygous deletions), or more than two copies (amplifications). The canonical example is Down syndrome which is caused by an extra copy of chromosome 21. Identification of such abnormalities in smaller regions has been of great interest, because it is believed to be an underlying cause of cancer. More than one decade ago comparative genomic hybridization (CGH)technology was developed to detect copy number changes in a high-throughput fashion. However, this technology only provides a 10 MB resolution which limits the ability to detect copy number alterations spanning small regions. It is widely believed that a copy number alteration as small as one base can have significant downstream effects, thus microarray manufacturers have developed technologies that provide much higher resolution. Unfortunately, strong probe effects and variation introduced by sample preparation procedures have made single-point copy number estimates too imprecise to be useful. CGH arrays use a two-color hybridization, usually comparing a sample of interest to a reference sample, which to some degree removes the probe effect. However, the resolution is not nearly high enough to provide single-point copy number estimates. Various groups have proposed statistical procedures that pool data from neighboring locations to successfully improve precision. However, these procedure need to average across relatively large regions to work effectively thus greatly reducing the resolution. Recently, regression-type models that account for probe-effect have been proposed and appear to improve accuracy as well as precision. In this paper, we propose a mixture model solution specifically designed for single-point estimation, that provides various advantages over the existing methodology. We use a 314 sample database, constructed with public datasets, to motivate and fit models for the conditional distribution of the observed intensities given allele specific copy numbers. With the estimated models in place we can compute posterior probabilities that provide a useful prediction rule as well as a confidence measure for each call. Software to implement this procedure will be available in the Bioconductor oligo packagehttp://www.bioconductor.org)

    Computing Power and Sample Size for Case-Control Association Studies with Copy Number Polymorphism: Application of Mixture-Based Likelihood Ratio Test

    Get PDF
    Recent studies suggest that copy number polymorphisms (CNPs) may play an important role in disease susceptibility and onset. Currently, the detection of CNPs mainly depends on microarray technology. For case-control studies, conventionally, subjects are assigned to a specific CNP category based on the continuous quantitative measure produced by microarray experiments, and cases and controls are then compared using a chi-square test of independence. The purpose of this work is to specify the likelihood ratio test statistic (LRTS) for case-control sampling design based on the underlying continuous quantitative measurement, and to assess its power and relative efficiency (as compared to the chi-square test of independence on CNP counts). The sample size and power formulas of both methods are given. For the latter, the CNPs are classified using the Bayesian classification rule. The LRTS is more powerful than this chi-square test for the alternatives considered, especially alternatives in which the at-risk CNP categories have low frequencies. An example of the application of the LRTS is given for a comparison of CNP distributions in individuals of Caucasian or Taiwanese ethnicity, where the LRTS appears to be more powerful than the chi-square test, possibly due to misclassification of the most common CNP category into a less common category
    • …
    corecore